---
title: Create monitoring jobs
description: Use the job definition UI to create monitoring jobs, allowing DataRobot to monitor deployments running and storing feature data and predictions outside of DataRobot.
---

# Create monitoring jobs via the UI

In addition to the Prediction API, you can create monitoring job definitions through the DataRobot UI. You can then [view and manage](manage-monitoring-job-def) monitoring job definitions as you would any other job definition.

To create the monitoring jobs in DataRobot: 

1. Click **Deployments** and select a deployment from the inventory.

2. On the selected deployment's **Overview**, click **Job Definitions**.

3. On the **Job Definitions** page, click **Monitoring Jobs**, and then click **Add Job Definition**.

4. On the **New Monitoring Job Definition** page, configure the following options:

    ![](images/monitoring-job-definition.png)

    |                        | Field name |  Description  |
    |------------------------|------------|---------------|
    | ![](images/icon-1.png) | Monitoring job definition name | Enter the name of the monitoring job that you are creating for the deployment. |
    | ![](images/icon-2.png) | Monitoring data source | Set the [source type](#set-monitoring-data-source) and [define the connection](data-conn) for the data to be scored.  |
    | ![](images/icon-3.png) | Monitoring options | Configure [monitoring options](#set-monitoring-options) and [aggregation options](#set-aggregation-options).  |
    | ![](images/icon-4.png) | Data destination | (Optional) [Configure the data destination options](#set-output-monitoring-and-data-destination-options) if you enable output monitoring. |
    | ![](images/icon-5.png) | Jobs schedule | Configure whether to run the job immediately and whether to [schedule the job](#schedule-monitoring-jobs).|
    | ![](images/icon-6.png) | Save monitoring job definition | Click this button to save the job definition. The button changes to **Save and run monitoring job definition** if **Run this job immediately** is enabled. Note that this button is disabled if there are any validation errors. |

## Set monitoring data source {: #set-monitoring-data-source }

Select a monitoring source, called an [intake adapter](intake-options), and complete the appropriate authentication workflow for the source type. Select a connection type below to view field descriptions:

!!! note
    When browsing for connections, invalid adapters are not shown.

**Database connections**

* [JDBC](../../../api/reference/batch-prediction-api/intake-options.html#jdbc-scoring)
    
**Cloud Storage Connections**
    
* [Azure](intake-options#azure-blob-storage-scoring)
* [GCP](intake-options#google-cloud-storage-scoring) (Google Cloud Platform Storage)
* [S3](intake-options#amazon-s3-scoring)

**Data Warehouse Connections**

* [BigQuery](intake-options#bigquery-scoring)
* [Snowflake](intake-options#snowflake-scoring)
* [Synapse](intake-options#synapse-scoring)

**Other**

* [AI Catalog](intake-options#ai-catalog-dataset-scoring) 

After you set your monitoring source, DataRobot validates that the data is applicable to the deployed model.

!!! note
    DataRobot validates that a data source is compatible with the model when possible, but not in all cases. DataRobot validates for AI Catalog, most JDBC connections, Snowflake, and Synapse.

## Set monitoring options {: #set-monitoring-options }

When setting the monitoring options, the options available depend on the model type: regression or classification. 

=== "Regression models"

    ![](images/monitoring-options-regression.png)

    Option                   |  Description
    -------------------------|-------------
    Association ID column    | Identifies the column in the data source containing the association ID for predictions.
    Predictions column       | Identifies the column in the data source containing prediction values. You must provide this field and/or **Actuals value column**.
    Actuals value column     | Identifies the column in the data source containing actual values. You must provide this field and/or **Predictions column**.
    Actuals timestamp column | Identifies the column in the data source containing the timestamps for actual values.

=== "Classification models"

    ![](images/monitoring-options-classification.png)

    Option                   |  Description
    -------------------------|-------------
    Association ID column    | Identifies the column in the data source containing the association ID for predictions.
    Predictions column       | Identifies the columns in the data source containing each prediction class. You must provide this field and/or **Actuals value column**.
    Actuals value column     | Identifies the column in the data source containing actual values. You must provide this field and/or **Predictions column**.
    Actuals timestamp column | Identifies the column in the data source containing the timestamps for actual values.

## Set aggregation options {: #set-aggregation-options }

To support challengers for external models with [large-scale monitoring](agent-use#enable-large-scale-monitoring) enabled (meaning that raw data isn't stored in the DataRobot platform), you can report a small sample of raw feature and predictions data; then, you can send the remaining data in aggregate format. Enable **Use aggregation** and configure the retention settings to indicate that raw data is aggregated by the MLOps library and define how much raw data should be retained for challengers.

!!! important "Autosampling for large-scale monitoring"
    To automatically report a small sample of raw data for challenger analysis and accuracy monitoring, you can define the `MLOPS_STATS_AGGREGATION_AUTO_SAMPLING_PERCENTAGE` when [enabling large-scale monitoring for an external model](agent-use#enable-large-scale-monitoring).

![](images/aggregation-options.png)

Property          | Description
------------------|------------
Retention policy  | The policy definition determines if the **Retention value** represents a number of **Samples** or a **Percentage** of the dataset.
Retention value   | The amount of data to retain, either a percentage of data or the number of samples.

If you define these properties, raw data is aggregated by the MLOps library. This means that the data isn't stored in the DataRobot platform. Stats aggregation only supports feature and prediction data, not actuals data for accuracy monitoring. If you've defined one or more of the **Association ID column**, **Actuals value column**, or **Actuals timestamp column**, DataRobot cannot aggregate data. If you enable the **Use aggregation** option, the association ID and actuals-related fields are disabled.

![](images/aggregation-options-note.png)

!!! note "Public Preview: Accuracy monitoring with aggregation"
    Now available for public preview, monitoring jobs for external models with aggregation enabled can support accuracy tracking. With this feature enabled, when you enable **Use aggregation** and configure the retention settings, you can also define the **Actuals value column** for accuracy monitoring; however, you must also define the **Predictions column** and **Association ID column**.

    **Feature flag OFF by default**: Enable Accuracy Aggregation

## Set output monitoring and data destination options {: #set-output-monitoring-and-data-destination-options }

After setting the prediction and actuals monitoring options, you can choose to enable **Output monitoring status** and configure the following options:

![](images/output-monitoring-status.png)


Option                        |  Description
------------------------------|-------------
Monitored status column        | Identifies the column in the data destination containing the monitoring status for each row.
Unique row identifier columns | Identifies the columns from the data source to serve as unique identifiers for each row. These columns are copied to the data destination to associate each monitored status with its corresponding source row.

With **Output monitoring status** enabled, you must also configure the **Data destination** options to specify where the monitored data results should be stored. Select a monitoring data destination, called an [output adapter](output-options), and complete the appropriate authentication workflow for the destination type. Select a connection type below to view field descriptions:

!!! note
    When browsing for connections, invalid adapters are not shown.

**Database connections**

* [JDBC](output-options#jdbc-write)
    
**Cloud Storage Connections**
    
* [Azure](output-options#azure-blob-storage-write)
* [GCP](output-options#google-cloud-storage-write) (Google Cloud Platform Storage)
* [S3](output-options#amazon-s3-write)

**Data Warehouse Connections**

* [BigQuery](output-options#bigquery-write)
* [Snowflake](output-options#snowflake-write)
* [Synapse](output-options#azure-synapse-write)

**Other**

* [Tableau](output-options#tableau-write)  

## Schedule monitoring jobs {: #schedule-monitoring-jobs }

You can schedule monitoring jobs to run automatically on a schedule. When outlining a monitoring job definition, enable **Run this job automatically on a schedule**, then specify the frequency (daily, hourly, monthly, etc.) and time of day to define the schedule on which the job runs.

![](images/batch-7.png)

For further granularity, select **Use advanced scheduler**. You can set the exact time (to the minute) you want to run the monitoring job.

![](images/batch-8.png)

After setting all applicable options, click **Save monitoring job definition**. 

